A case study of trace-driven simulation for analyzing interconnection networks: cc-NUMAs with ILP processors
نویسندگان
چکیده
The evaluation of network performance under real application loads is carried out by detailed time-intensive and resourceintensive simulations. Moreover, the use of ILP processors in cc-NUMA architectures introduces non-deterministic memory accesses; the resulting parallel system must be modeled by a detailed execution-driven simulation, further increasing the evaluation cost. This work introduces a simulation methodology, based on network traces, to estimate the impact that a given network has on the execution time of parallel applications. This methodology allows the study of the network design space with a level of accuracy close to that of execution-driven simulations but with much shorter simulation times. The network trace, extracted from an execution-driven simulation, is processed to substitute the temporal dependencies produced by the simulated network with an estimation of the message dependencies caused by both the application and the applied cache-coherent protocol. This methodology has been tested on two direct networks, with 16 and 64 nodes respectively, running the FFT and Radix applications of the SPLASH2 suite. The trace-driven simulation is 3 to 4 times faster than the execution-driven one with an average error of 4% in total execution time.
منابع مشابه
Exploring the Switch Design Space in a CC-NUMA Multiprocessor Environment
The switch design for interconnection networks plays an important role in the overall performance of multiprocessors and computer networks. It is therefore crucial to study various factors in the switch design space and their influence on the system performance. In this paper we first propose a 4-D framework for the design of input queuing switches with wormhole routing and virtual channels. Th...
متن کاملEvaluation of memory latency in cluster-based cache- coherent multiprocessor systems with dierent interconnection topologies
This research investigates memory latency of cluster-based cache-coherent multiprocessor systems with dierent interconnection topologies. Each node in a cluster includes a small number of processors and a portion of the shared-memory, which are all connected through a split transaction bus. Each processor has two levels of caches. As the number of processors in a node is small, a snoopy cache ...
متن کاملParallel Architectures and Applications ( Cooperative Project ) TIN 2004 - 07440 - C 02 ( 01 and 02 )
This research project deals with interconnection technologies for parallel and distributed computing systems. Nowadays, interconnection networks are ubiquitous in the computing field. At the chip level, networks are present in clustered microprocessors, on-chip multiprocessors and systems on chip. Some workstations and servers begin to use switched networks to interconnect their different modul...
متن کاملEdinet: An Execution Driven Interconnection Network Simulator for DSM Systems
Evaluation studies on interconnection networks for distributed memory multiprocessors usually assume synthetic or trace-driven workloads. However, when the final design choices must be done a more precise evaluation study should be performed. In this paper, we describe a new execution-driven simulation tool to evaluate interconnection networks for distributed memory multiprocessors using real a...
متن کاملHead-Driven Simulation of Water Supply Networks
Up to now most of the existing water supply network analyses have been based on demand-driven simulation models. These models assume that nodal outflows are fixed and are always available. However, this method of simulation neglects the pressure-dependent nature of demand that is characterized by changes in actual nodal outflows particularly during critical events like major mechanical or hydra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000